Metabarcoding Singapore
ASV from Dada2

1 Aim

  • Assign and analyze eukaryotes for Singapore metabarcoding data (ASV assigned with dada2 as implemented on Mothur).
  • Do some analyzes with the prokaryotes too…

2 Initialize

This file defines all the necessary libraries and variables

3 Read the data

3.2 Read the files

  • The dada2 treatment has already removed the forward and reverse primers, so no need to remove them
  • Work with the unrarefied data
# Read the sample and metadata tables
  sample_table <- read_excel(metadata_xlsx, sheet="samples", range="A1:D89") 

  metadata_table  <- read_csv(metadata_csv, na=c("ND", "")) %>% 
    dplyr::rename(sample_code=Sample, day_number=Day_number, date=Date, location=Location, monsoon=`Monsoon period`) %>% 
    select(-Strait) %>% 
    mutate(date = lubridate::parse_date_time(date,"dmy"))
  
  station_table  <- read_excel(metadata_xlsx, sheet="stations", na=c("ND", ""))

  sample_table <- sample_table %>% 
    left_join(metadata_table) %>%
    left_join(station_table) %>% 
    mutate(sample_label = str_c(strait_label,location_label,
                                monsoon,sprintf("%03d",day_number), 
                                sep="_"))

# Read the taxonomy table
  taxo_table <- read_tsv(taxo_file)

# Clean up the taxonomy
  taxo_table <- taxo_table %>% 
    mutate(taxo_clean = str_replace_all(Taxon, "D_[0-9]+__","")) %>% 
    separate(col=taxo_clean, into=str_c("taxo", c(1:7)), sep=";") %>% 
    rename(otu_name = `Feature ID`)
  
# Read the otu table
  otu_table <- read_tsv(otu_file, skip=1) %>%  # Jump the first line
    rename(otu_name = `#OTU ID`) %>% 
    mutate(otu_id = str_c("otu_", sprintf("%04d",row_number())))

# Read the sequences
  otu_sequences <- readAAStringSet(sequence_file)
  otu_sequences.df <- data.frame (otu_name=names(otu_sequences),sequence=as.character(otu_sequences))

# Remove the primers - Not necessary because the primers have been removed  
  # fwd_length = 20
  # rev_length = 15
  # otu_sequences.df <- otu_sequences.df %>% 
  #   separate (col=names, into=c("otu_id_qiime", "otu_rep_seq"), sep=" ") %>%
  #   mutate (sequence = str_sub(sequence, start=fwd_length+1, end = - rev_length - 1))

  otu_table <- taxo_table %>%  
    left_join(otu_table) %>% 
    left_join(otu_sequences.df) %>% 
    arrange(otu_id)
  
# Write a fasta file for blast with all taxonomy roups
 # otu_sequences <- otu_table %>% transmute(sequence=sequence, seq_name=otu_id)
 # fasta_write(otu_sequences, file_name="../qiime/otu_rep_98_all.fasta", compress = FALSE, taxo_include = FALSE)  

5 Process BLAST file

BLAST is performed on Roscoff ABIMS server

6 Map

Use leaflet

7 Environmental data

7.1 Per station

8 Phyloseq analysis

8.2 Break up into photosynthetic and non-photosynthetic

  • Opisthokonta (Metazoa, Fungi) are removed

Phyloseq Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 668 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 668 taxa by 8 taxonomic ranks ]

Phyloseq Photosynthetic Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

Phyloseq Heterotrophic Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 397 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 397 taxa by 8 taxonomic ranks ]

8.4 Normalize number of reads in each sample using median sequencing depth.

  • ! If there no cells do not transform, just set column to 0 function(x, t=total_hetero) (if(sum(x) > 0){ t * (x / sum(x))} else {x})

The median number of reads used for normalization of eukaryotes (auto+hetero) is  4735
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 668 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 668 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes autotrophs is  3038
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes heterotrophs is  983
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 397 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 397 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of prokaryotes is  54273
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2079 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2079 taxa by 8 taxonomic ranks ]

8.5 Phyloseq files for abundant taxa

  • Remove taxa that are < 0.10 (euks) and <0.05 (proks) in any given sample
  • Normalize again…
Remove taxa in low abundance 

The median number of reads used for normalization of eukaryotes (auto+hetero) is  3359
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 60 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 60 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes autotrophs is  2767
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 60 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 60 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes heterotrophs is  694
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 83 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 83 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of prokaryotes is  24353
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 44 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 44 taxa by 8 taxonomic ranks ]

8.14 Heatmaps

8.15 NMDS

Sample removed because they were pulling the NMDS * PR2X16XS21 it has a single eukaryote (diatom bloom ) * RM13XS36 cause problem for bacteria * PR11XS25 cause problem for hetero euks * SBW11XS26 cause problem for hetero euks * SBW13XS37 cause problem for hetero euks * RM13XS36 cause problem for hetero euks

Define function

8.15.1 All OTUs

Square root transformation
Wisconsin double standardization
Run 0 stress 0.12 
Run 1 stress 0.13 
Run 2 stress 0.13 
Run 3 stress 0.13 
Run 4 stress 0.14 
Run 5 stress 0.13 
Run 6 stress 0.14 
Run 7 stress 0.14 
Run 8 stress 0.12 
... New best solution
... Procrustes: rmse 0.046  max resid 0.15 
Run 9 stress 0.14 
Run 10 stress 0.13 
Run 11 stress 0.13 
Run 12 stress 0.13 
Run 13 stress 0.13 
Run 14 stress 0.12 
Run 15 stress 0.13 
Run 16 stress 0.13 
Run 17 stress 0.14 
Run 18 stress 0.14 
Run 19 stress 0.14 
Run 20 stress 0.13 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.12 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.19 
Run 1 stress 0.19 
Run 2 stress 0.18 
... New best solution
... Procrustes: rmse 0.049  max resid 0.28 
Run 3 stress 0.19 
Run 4 stress 0.19 
Run 5 stress 0.19 
Run 6 stress 0.18 
Run 7 stress 0.2 
Run 8 stress 0.18 
Run 9 stress 0.19 
Run 10 stress 0.2 
Run 11 stress 0.18 
... Procrustes: rmse 0.039  max resid 0.2 
Run 12 stress 0.19 
Run 13 stress 0.19 
Run 14 stress 0.19 
Run 15 stress 0.19 
Run 16 stress 0.19 
Run 17 stress 0.18 
Run 18 stress 0.19 
Run 19 stress 0.2 
Run 20 stress 0.19 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.18 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.18 
Run 1 stress 0.17 
... New best solution
... Procrustes: rmse 0.051  max resid 0.25 
Run 2 stress 0.18 
Run 3 stress 0.17 
... New best solution
... Procrustes: rmse 0.051  max resid 0.22 
Run 4 stress 0.18 
Run 5 stress 0.18 
Run 6 stress 0.19 
Run 7 stress 0.18 
Run 8 stress 0.18 
Run 9 stress 0.17 
... New best solution
... Procrustes: rmse 0.036  max resid 0.2 
Run 10 stress 0.18 
Run 11 stress 0.17 
Run 12 stress 0.18 
Run 13 stress 0.18 
Run 14 stress 0.18 
Run 15 stress 0.18 
Run 16 stress 0.17 
... Procrustes: rmse 0.033  max resid 0.17 
Run 17 stress 0.18 
Run 18 stress 0.18 
Run 19 stress 0.17 
Run 20 stress 0.17 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.17 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

8.15.2 Abundant OTUs

Square root transformation
Wisconsin double standardization
Run 0 stress 0.12 
Run 1 stress 0.13 
Run 2 stress 0.15 
Run 3 stress 0.15 
Run 4 stress 0.14 
Run 5 stress 0.12 
Run 6 stress 0.14 
Run 7 stress 0.15 
Run 8 stress 0.12 
Run 9 stress 0.16 
Run 10 stress 0.12 
... Procrustes: rmse 0.00012  max resid 9e-04 
... Similar to previous best
Run 11 stress 0.12 
Run 12 stress 0.15 
Run 13 stress 0.15 
Run 14 stress 0.14 
Run 15 stress 0.15 
Run 16 stress 0.14 
Run 17 stress 0.15 
Run 18 stress 0.13 
Run 19 stress 0.12 
Run 20 stress 0.15 
*** Solution reached

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.12 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.17 
Run 1 stress 0.17 
Run 2 stress 0.17 
Run 3 stress 0.17 
Run 4 stress 0.17 
Run 5 stress 0.17 
Run 6 stress 0.17 
Run 7 stress 0.17 
Run 8 stress 0.17 
Run 9 stress 0.17 
Run 10 stress 0.17 
Run 11 stress 0.17 
Run 12 stress 0.17 
Run 13 stress 0.18 
Run 14 stress 0.17 
Run 15 stress 0.17 
Run 16 stress 0.17 
... Procrustes: rmse 0.049  max resid 0.26 
Run 17 stress 0.17 
Run 18 stress 0.17 
Run 19 stress 0.17 
Run 20 stress 0.17 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.17 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.16 
Run 1 stress 0.16 
Run 2 stress 0.17 
Run 3 stress 0.17 
Run 4 stress 0.16 
Run 5 stress 0.17 
Run 6 stress 0.18 
Run 7 stress 0.17 
Run 8 stress 0.17 
Run 9 stress 0.16 
Run 10 stress 0.16 
Run 11 stress 0.16 
... New best solution
... Procrustes: rmse 0.056  max resid 0.25 
Run 12 stress 0.17 
Run 13 stress 0.17 
Run 14 stress 0.17 
Run 15 stress 0.16 
Run 16 stress 0.16 
Run 17 stress 0.17 
Run 18 stress 0.17 
Run 19 stress 0.17 
Run 20 stress 0.17 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.16 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Daniel Vaulot, Adriana Lopes dos Santos

29 11 2018